April 29, 2017

Flu Data

Goal: Predictive Distributions for 3 Targets

  1. Peak Incidence
  2. Peak Timing
  3. Onset Timing

Base Models

  • We've Developed Several Models for This Task
    1. KDE: Kernel Density Estimation
      • Directly model distribution of season peak incidence/timing and onset timing
      • Distributions are not updated over the course of the season!
    2. KCDE: Kernel Conditional Density Estimation + Copulas (Ray et al., under review)
      • Get a joint distribution for incidence in all remaining weeks conditional on recent incidence and time of year
      • Integrate to get distributions for peak incidence/timing and onset timing
    3. SARIMA: Seasonal Auto-Regressive Integrated Moving Average model
      • Similar in spirit to KCDE, but with more parametric assumptions
  • Relative performance varies
    • across different seasons
    • with the week when we're making predictions
    • with model uncertainty
    • with recently observed incidence

Relative Performance Varies with the Week

Relative Performance Varies with the Week

Stacking: Weighted Sum of Predictive Distributions

  • We want a predictive distribution for \(y | \mathbf{x}\).
    • \(y\) = e.g. season onset timing, peak timing, or peak incidence
    • \(\mathbf{x}\) = time of year, recent incidence, weather, …
  • We have \(M = 3\) predictive distributions \(f_m(y | \mathbf{x})\) from different models
  • Combine with covariate-dependent weights \(\pi_m(\mathbf{x})\): \[f(y | \mathbf{x}) = \sum_{m = 1}^M \pi_m(\mathbf{x}) f_m(y | \mathbf{x})\]
  • We require \(\pi_m(\mathbf{x}) \geq 0\) and \(\sum_{m = 1}^M \pi_m(\mathbf{x}) = 1\) for each \(\mathbf{x}\). One approach: \[\pi_m(\mathbf{x}) = \frac{\exp\{\rho_m(\mathbf{x})\}}{\sum_{m' = 1}^M \exp\{\rho_{m'}(\mathbf{x})\}}\]
  • We estimate the functions \(\rho_m(\mathbf{x})\) via Gradient Tree Boosting (GTB)

We'll Compare Several Variations

  1. Equal Weights (EW): \(\pi_m(x) = 1/M\).
  2. Constant Weights (CW): \(\pi_m(x) = \pi_m\).
  3. Feature-weighted (FW): \(\pi_m(x)\) depends on features including week of the season and model uncertainty for the KCDE and SARIMA models.
  4. Feature-weighted with regularization: \(\pi_m(x)\) depends on features, but with regularization:
    1. (FW-reg-w) week of the season;
    2. (FW-reg-wu) week of the season and model uncertainty for the KCDE and SARIMA models;
    3. (FW-reg-wui) week of the season, model uncertainty for the KCDE and SARIMA models, and incidence in the most recent week.

Evaluation

"Public health actions informed by forecasts that later prove to be inaccurate can have negative consequences, including the loss of credibility, wasted and misdirected resources, and, in the worst case, increases in morbidity or mortality."

– Biggerstaff et al. BMC Infectious Diseases 2016; 16(1):357.

We're looking for two things:

  1. Good overall performance
  2. Consistency

Conclusions

  • All reasonable ensembles had similar overall performance as the best of the component models
  • FW-reg-w ensemble had more consistent performance across seasons than the component models
  • Limitations:
    • Only 3 component models – would prefer to have more diversity
    • 5 test phase seasons
  • Future Directions:
    • Additional component models (additional predictive covariates, model disease transmission mechanisms)
    • Conditionally adaptive kernel conditional density estimation via ensemble framework

Relative performance varies with the week

Model Weight Functions Are Reasonable (Part 2)

Relative performance varies with model uncertainty